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Community detection in networks refers to the process of seeking strongly internally connected 
groups of nodes which are weakly externally connected. In this work, we introduce and study a 
community definition based on internal edge density. Beginning with the simple concept that edge 
density equals number of edges divided by maximal number of edges, we apply this definition to a 
variety of node and community arrangements to show that our definition yields sensible results. Our 
community definition is equivalent to that of the Absolute Potts Model community detection method 
(Phys. Rev. E 81, 046114 (2010)), and the performance of that method validates the usefulness of 
our definition across a wide variety of network types. We discuss how this definition can be extended 
to weighted, and multigraphs, and how the definition is capable of handling overlapping communities 
and local algorithms. We further validate our definition against the recently proposed Affiliation 
Graph Model (arXiv: 1205.6228 [cs.SI]) and show that we can precisely solve these benchmarks. More 
than proposing an end-all community definition, we explain how studying the detailed properties 
of community definitions is important in order to validate that definitions do not have negative 
analytic properties. We urge that community definitions be separated from community detection 
algorithms and and propose that community definitions be further evaluated by criteria such as 
these. 



I. INTRODUCTION 

R is standard to use the concepts of graph theory 
in order to represent the interactions of complex sys- 
tems. Here, the nomenclature of "nodes" and "edges" 
is used to represent generic items and the interactions 
between them[I]. One important form of structure in 
graph theory is that of communities, or strongly con- 
nected subgroups P]. There is no single agreed upon def- 
inition for communities, but it is generally accepted that 
communities are groupings of nodes that are strongly 
connected to each other and weakly connected to nodes 
in other communities. It is important to note the two 
uses of the term "community" here. The first is a real- 
world grouping of objects, sometimes known as "ground- 
truth" in other works [3] ■ This grouping is not precisely 
mathematically defined, but is empirically defined based 
on the, e.g., social, anthropological, biological, etc. data 
upon which the graph is created [3HTU]. The second form 
of "community" is a mathematical construction of nodes 
and edges. The community detection field has two goals: 
the definition of mathematical communities that most 
correspond to real- world communities, and the develop- 
ment of algorithms that can locate these mathematical 
communities within graphs. 

Many community definitions have been proposed in the 
literature. Several reviews are dedicated to an overview 
and comparison of community detection (CD) methods, 
and more specifically, of community definitions them- 
selves [21 [3l QT] . Often, primary emphasis is placed on 
the description and workings of the community detection 
method itself, and the method's particular community 
definition is only implicitly defined as the practical re- 
sult of applying the CD algorithm, and not as a separate 
formal definition. This has been accepted as a practical 



thing to do, but much more could be learned if the scope 
and definitions of communities were broader. 

The Girvan-Newman modularity is one of the few 
heavily-studied community definitions [T2TfT6] . Modu- 
larity weights internal edges against external edges in 
an attempt to indicate, without any user-input parame- 
ters, a "best" community structure for any type of graph. 
Modularity is one of the oldest and most-emphasized of 
the existing community definitions, but despite its util- 
ity, there is a useful caveat to be made in the study of 
(ground-state) definitions of modularity. In 2006 For- 
tunato and Barthelemy showed that modularity has a 
very interesting property: namely an implicit depen- 
dence on the total size of the network to which it is 
applied [17j [18]. This prevents modularity-based commu- 
nity detection methods from resolving small communities 
in a large graph. In short, the optimal community in one 
partition of a graph depends upon properties of the graph 
far away. This behavior is non-intuitive and referred to 
as a "resolution limit." Several attempts have been made 
to produce multi-resolution modularity measures which 
have a tunable parameter that "zooms" in or out and 
controls the size of detected communities [19] , but these 
too have been shown to suffer resolution limits [5DJ [21] . 
Modularity has taught us several things. First and fore- 
most, community definitions need theoretical study. Sec- 
ond, any global community definition may have a res- 
olution limit, necessitating local community definitions 
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In order to understand the relevance of the work pre- 
sented in this work, it is useful to look at past community 
detection approaches from the standpoint of optimizing 
over some configurational space. Instead of trying to im- 
prove sample techniques, we are looking at a different 
cost function which gives a different ground state, which 
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may be "more correct" than another ground state. As 
an example, after the introduction of modularity as a 
measure of community structure, many attempts were 
put forth to improve community detection via the use of 
enhanced sampling of the configuration space of commu- 
nity assignments in order to better optimize modularity 
[1311151 |2"1H52"] . As useful as these methods are, they all 
share one fundamental limit: they rely on the assump- 
tion that modularity is the correct cost function to op- 
timize and that community detection is limited by the 
ability to properly sample configuration space |21j : ba- 
sically, overcoming kinetic barriers in minimization dy- 
namics. On the other hand, it is natural to place an 
emphasis on understanding other ground state commu- 
nity definitions and properties, i.e. the mapping from 
real-world to mathematical communities, before focus- 
ing on the search for optimal partitions. The viewpoint 
we espouse is that only after understanding ground state 
characteristics should one focus on the sampling required 
to finding that ground state. There are few such analyzes 
in existing literature, but the number is growing. 

In this work, we propose a community definition based 
on internal edge density (abbreviated as simply "edge 
density" ) . Edge density-based definitions have been con- 
sidered before in a wide variety of contexts [3j EJ [331437] . 
The particular edge density definition we focus on has 
been used in an implicit fashion previously [55J , but 
has never been explicitly defined and extensively stud- 
ied, as we do here. Our definition affords us the ability 
to compare the properties of graph-based communities 
to the properties of real-world communities as a means 
of quantifying the degree to which they match. Cur- 
rently the most common method of assessing community 
detection algorithms is to choose a test graph and run 
community detection. This gives a biased perspective 
of community detection, since it takes some initial belief 
about the structure of communities and optimizes meth- 
ods towards that definition. Recently, there has been 
much needed emphasis on the comparison of commu- 
nity definitions to real-world communities. In order to 
make progress, we show how edge density relates to ac- 
tual properties of model graphs and communities. 

In addition to the above benefit of edge density-based 
communities, edge density can be quite naturally ex- 
tended to weighted and multi-graphs, and handle over- 
lapping communities, all important areas of modern re- 
search in community detection. In particular, few algo- 
rithms have been proposed that are capable of detecting 
overlapping communities. Thus, our work not only pro- 
vides a conceptual breakthrough in terms of a rigorous 
analysis of new community definitions, but also provides 
a practical benefit in community detection ability. As an 
outgrowth of these extensions, we propose a new variable 
topology Potts model, which allows a more natural means 
of community detection in heavily weighted graphs. 

To provide concrete illustration of these claims, we 
turn to a recently proposed proposed social network 
model. In particular, we focus on recent work of Yang 



and Leskovec (YL), who have proposed a new model of 
social networks, the Affiliation Graph Model (AGM)[40], 
This model takes into account features observed in real- 
world social networks found via comparison with online 
social networks. YL claim that no current community 
detection algorithms can describe and detect communi- 
ties in these graphs. We conceptually show that the edge 
density community definition models this graph properly, 
and then perform actual community detections to prove 
that this is the case. 

We do not claim here that edge density is a universal 
definition of community that applies to all possible com- 
munities in all possible graphs. Instead, we provide the 
tools to determine if any one particular class of graphs 
is well described by an edge density picture, either by an 
analysis of the graph generation process, or properties of 
the edge structure. We hope to inspire similarly detailed 
analysis, and more rigorous comparison, of existing and 
future community definitions. 

This remainder of this work is organized as follows: 
In section [TTJ we define the basic tools needed to quan- 
tify edge density, then state our edge density definition. 
In section |III[ we describe historic and new edge den- 
sity models, and the Potts model framework we use to 
perform actual community detection. In section |IV[ we 
describe some universal properties of edge density, which 
are independent of the exact model used to perform com- 
munity detection. In section [Vj we describe how certain 
models handle the boundaries between communities dif- 
ferently, and explain our model of choice. In section |VT] 
and Appendix [XJ we discuss practical considerations for 
constructing an actual algorithm employing edge den- 
sity. In section |VII[ we validate the use of edge den- 
sity as a scale parameter. In section |VIII[ we discuss 
general considerations for overlapping communities, and 
contrasting with some recent work, we show that edge 
densities detect the correct conceptual behavior. In sec- 
tion |IX[ we rigorously correlate our edge density to af- 
filiation graph models, and see that we can detect these 
communities easily. In section [Xj we describe the ex- 
tension to weighted graphs, and how our new variable 
topology Potts model provides a significant improvement 
over older models. In section |XI[ we discuss various pos- 
sible extensions. In section |XII[ we discuss some gen- 
eral background and address some possible limitations 
of edge density. In section [XIII[ we discuss the danger 
of over-optimizing to particular benchmark models and 
why studies of edge density are useful nonetheless. In 
section [XIV[ we provide concluding remarks and discuss 
directions for future research. 



II. EDGE DENSITY 

We must first define the nomenclature and measures we 
will use. We will specify communities or groups of nodes 
by capital latin letters such as A and B, and individual 
nodes by lowercase letters such as a, b, x, and y. In 
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particular, the node a shall represent an arbitrary node 
of community A, and specific nodes of community A are 
indexed as a,. The nodes x and y generally represent 
nodes which are not currently in any community. The 
community A will have ha nodes. 

We use a set terminology to discuss communities and 
groups of nodes. Set union is denoted with "U", indi- 
cating every node on the left or right or both sides of 
the operator. Set intersection is denoted "Pi", indicat- 
ing every node that is on both sides of this operator. We 
use this nomenclature loosely, allowing constructs such as 
u AUx" even though we are operating on a community on 
the left and an individual node on the right. This should 
be taken to mean union of the set of nodes in community 
A and the set of nodes containing x. 

For any group of nodes (explained below) , I represents 
the maximal number of possible edges ("links" in our 
nomenclature). The variable e represents the actual num- 
ber of edges. The edge density p is the fraction of these 
links which have edges actually present, 



(1) 



We can calculate the edge density within and between 
a variety of different types of groupings of nodes, all of 
which can be useful under different circumstances. Fig. [I] 
illustrates the most common situations, (a) We can cal- 
culate the edge density within only one community A, 
in which case I — \nAijiA — 1)- Community edge den- 
sity is used to quantify the absolute community size and 
scale, (b) We can calculate the edge density between a 
community B and a node x not currently in community 
B, in which case I = ns, since that node can connect 
once to every node in B. This is useful when deciding if 
a node should enter or leave a community, (c) We can 
calculate the edge density between two communities C 
and C , in which case I = ncnc'j since every node in C 
can connect once to every node in C . This is useful when 
testing if two communities should merge, (d) We can cal- 
culate the edge density for the case of one node y which 
has edges between two communities D and D' , in which 
case I By — tib and In'y — nu>. This is useful for decid- 
ing which of two communities the node y would prefer 
to join, (e) By convention, when we calculate the edge 
density between a community E and node e' which is 
currently within that community, we only consider links 
between e' and the Ue — 1 other nodes of the communi- 
ties. Thus, in this case, Ieb 1 = n-E — 1 instead of tie, in 
contrast to case (b). 



A. The edge density community definition 

The most basic form of an edge density community 
definition (EDCD) for overlapping or isolated communi- 
ties consists of three parts, (a) A community A of scale 
p* is a subset of nodes with an internal edge density pa 
greater than p* . (b) Communities are taken to be as large 
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FIG. 1: Edge density calculations in a variety of situations. 
For each situation, I is the maximal possible number of edges, 
e is the actual number of edges, and the edge density p — e/l. 
In (a), we see the edge density of one community A, a group 
of n — 5 nodes, with I — |n(n — 1) = 10 total possible 
links (dotted and solid lines), e = 6 actual edges (solid lines), 
thus leading to a pa = jq- In (b), we see the edge density 
definition applied between one node x and a community B. 
The community B contains eight nodes, for a total number 
of possible links to node x of Ibx = 8. We see four actual 
links in place, leading to a pBx = f ■ Note that the internal 
connectivity of the community B is irrelevant and not shown. 
In (c), we see the edge density as applied to two communities 
C (5 nodes) and C" (4 nodes). Internal edge structure is 
again irrelevant and not shown. There are a total of Ice = 
ncno 1 = 20 possible links between these two communities, 
ecc' — 5 edges in place, for an edge density of Pec = Jj- 
(d) shows the edge density of one node d being pulled between 
two communities D and D 1 . Edge density for each half is in 
analogy to (b). In (e), we see the edge density between a node 
e and a community which contains it, E. This is similar to 
(b) except that, by convention, we use nj — 1 links as our 
basis. 



as possible (absorbing as many nodes as possible) as long 
as p > p* . (c) A community A of scale p* must have each 
individual node a's edge density pAa > P* ■ 

Part (a) is the essence of the definition, establishing a 
scale parameter of the communities. Part (b) is neces- 
sary because without it, we could always prefer smaller, 
more dense communities. For example, a graph could be 
partitioned into two-node cliques, all of which have an 
edge between the nodes, to get communities which all 
have p = 1 but do not provide useful information con- 
cerning the overall structure of the graph. Part (c) is 
necessary to ensure that every node is well-connected to 
the community, preventing nodes only loosely connected 
from being added to any community. In future sections, 
we will rigorously show that the absolute Potts model 
and the variable topology Potts model return communi- 
ties satisfying these criteria. Part (a) will be shown via 



Eqs. (jll 
in SecHn 



D 



18) in Sec. Ill Parts (b) and (c) are discussed 



This definition is sufficient when communities are con- 
sidered in isolation, i.e. when we never consider one node 
and must decide which of several communities it might 
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join, if it would increase p for any of them (Fig.[T](d)). In 
these cases, one may choose to add the node to a larger 
community resulting in a still larger resultant commu- 
nity, or to the community to which it shares the higher 
edge density. Both of these are reasonable options. One 
simple criteria would be to use the total number of edges 
as a criteria: a node x joins community A instead of 
community B if eAx > &Bx- Existing edge density com- 
munity definition methods make different choices here, 
which will be discussed in Section Ivl 



B. Connection to graph generation processes 

The edge density variable p refers to actual edge densi- 
ties in a specific graph instance. Many benchmark graphs 
are designed stochastically, with edges placed within or 
between communities with a some specified probability of 
p. Since each link (possible edge) has an edge placed with 
a probability of p independent of all other edges, the vari- 
able e (number of edges placed) is binomially distributed 
with mean Ip (and probability of success p). Each of 
the situations depicted in Fig. [T] can have a defined p. 
For example, planted l-partition model (also known as 
Stochastic Block Models (SBMs)) graphs are defined by 
specifying intra-community edge densities pa, Pb, etc. 
(Fig. [I] (a)), and inter-community edge densities pab, 
Pac, Pbc, etc. (Fig. [T] (c)) 41 j. Then, each respective 
e will have a binomial distribution with (its respective) I 
trials and (its respective) p chance (of an edge) per trial, 

P[e = x} = B(x; l,p) » N{x- Ip, lp(l - p)). (2) 

B(l,p) represents a binomial distribution of I trials and p 
probability per trial, and J\f{ix), a 2 ) represents a normal 
distribution with a mean of (x) and a variance of a 1 
after we make a normal approximation to the binomial 
distribution. With p — e/l, we have a distribution of p 
of 



P 



P = 



B(x;l,p), 



P[p = x] m Af [x;p 



P(l - P) 
I 



(3) 
(4) 



In the above, we employed the following shorthand for 
(a normalized) binomial distribution, 



B(m;n,p)=( )p m (l-p) 



The normal distribution is, explicitly, given by 



M(x;(x),* 2 ) = 



1 



V2~t 



: CXp 



(2cr 2 



(5) 



(6) 



Clearly, p is a random variable with a mean of p and 

has a standard deviation of \J p ^ 1 ~ p - > . We see the intu- 
itive result that as we approach larger community sizes, 



we approach mean-field p — p with decreasing standard 
deviation around this mean. This relation between p and 
an underlying p is valid for any case where all links share 
the same edge probability. 



III. EDGE DENSITY MODELS 

In the previous section, we stated a basic edge density 
definition. This definition has been implicitly used in 
several preexisting community detection methods. Thus, 
the edge density community definition is not new, but it 
has never been as explicitly stated and analyzed. Before 
we proceed to more specific uses of the edge density com- 
munity definition, we will review existing Potts-model 
based edge density community definitions. 

Recent edge density-based community definitions are 
formulated in terms of Potts models [32Tl44| . The Potts 
model is a spin system, with each site having one of up 
to q associated spin flavors a = 0, . . . , q — 1. It is sim- 
ilar to the Ising model except that the Ising model al- 
lows spins of only or 1 (i.e., the Ising model is a Potts 
model with q — 2). For community detection, we take 
the spin flavor of each site (node) as corresponding to a 
community assignment. That is, if a a = p then node 
a lies in community number p. (The index p clearly 
lies in the range < p < (q — 1)). Inter-site interac- 
tions which define a Hamiltonian are minimized to find 
an optimal community assignment. This may be done 
by penalizing inter-community edges and favoring intra- 
community edges. Reichardt and Bornholdt explain the 
balances and equivalences of sums of internal and exter- 
nal edges in The Reichardt-Bornholdt (RB) Potts 
model was the first explicit Potts model used for com- 
munity detection, but it does not use our edge density 
community definition [THl Instead, the RB Potts 

model is equivalent to the Girvan-Newman modularity. 
Because the RB Potts model does not correspond to the 
edge density definitions in this paper, it is not considered 
in the following analysis. 

Potts models are ideal for edge density definitions, 
because they consist primarily of a sum over all edges 
|45j . The models allow us to select only internal edges, 
although it is equally possible to select only external 
edges. One possible limitation of Potts models is that 
each node can only be associated with one community. 
This would seem to exclude the possibility of overlapping 
communities. In order to expand the Potts models to al- 
low overlapping communities, we reformulate the models 
away from an edge-centric definition towards an equiva- 
lent community-centric definition. 



A. Absolute Potts model 

The absolute Potts model (APM) is a spin-based fer- 
romagnetic community detection method 38 , 39 . It has 
been shown to be very accurate at community detection 
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under a wide variety of circumstances 38, 39, 4(3 HZ]. The 
APM implicitly uses the edge density community defini- 
tion. In the Potts representation of the APM, the CD 
problem is mapped onto an energy minimization of a 
Hamiltonian defined over pairs of spins 



E = - ~ (Am' - jB aa >)5(a a ,a a > 



(7) 



The sum above is over distinct nodes a, a' with, as ex- 
plained above, o~ a denoting the community assignment of 
node a, and A aa i being the weighted "adjacency matrix". 
In unweighted graphs, A aa i = 1 if an edge is present 
between nodes a and a', and A aa > = otherwise. For 
weighted graphs, described further in Sec. |Xj A aa > = w, 
where w is the respective edge weight. In unweighted 
graphs, we will define the "inverse adjacency matrix" to 
have elements of B aa i = 1 if there is no edge preset, zero 
otherwise. For more general unweighted graphs, we will 
set B aa ' = 1 — A aa > . In these models, lower energies are 
attractive, and any shifts in community assignment that 
lower energy are favorable. 

This Hamiltonian in Eq. Q involves a sum over all 
edges. However, due the presence of the Kronecker delta 
(i.e., 5{<j a o a ') = 1 if a = of and zero otherwise), the sum 
is inherently local. That is, the sum contains only edges 
between nodes which are in the same community. An 
inter-community edge has a favorable (negative) energy 
of 1, and each missing edge has an energy penalty (pos- 
itive energy) of 7. While typically represented within a 
spin formulation, this can be recast with the primary sum 
over communities instead of over edges: 



E 



E E 

A (a,a'^a)eA 



1 



{A aa > ~ lB aa >) , 



(8) 



with (a, a') being all ordered pairs of nodes in commu- 
nity A. By the use of the term "ordered", we make 
evident that we include both the pair (a, a') and the 
pair (a', a) (for a ^ a') in the sum. The spin formu- 
lation appears simpler conceptually, and may be imple- 
mented much more efficiently, although the community- 
centric definition allows us to make additional theoretical 
progress. 

Since the inner sum of \A aa i is the number of edges 
in community A, and the sum of \B aa i is the number of 
missing edges in community A, we can rewrite this as 



E = - ^2 ( e A ~ l( l A ~ e A )) . 



(9) 



The energy is now written as sum over edge densities. 
We see that for the energy of any one community to be 
negative (attractive and having a binding energy), the 
term pa — 7(1 — Pa) must be positive. Rearranging, this 
gives us a correspondence between the APM variable 7 
and the critical (minimum) edge density p* 



PA > 



7 



1 + 7 



P ■ 



(11) 



Eq. (11) is the fundamental relationship between the 



APM variable 7 and the edge density critical value p* . It 
shows that any community returned by the APM must 
satisfy the edge density community definition part (a). 
We can now state two identical relationships which give 
equivalences of the APM and the edge density view- 
points: 



P = 

7 = 



7 

l+7 : 

P* 
l-p* 



(12) 
(13) 



We can rewrite the APM Hamiltonian in terms of p* 
instead of 7 using Eqs. (10 13), 



25 = -J> 



PA~ P 
l-p* 



(14) 



We can now relate the previously existing APM to 
our new edge density community definition. First, if 
Pa > P* , the energy of community A is negative, and 
thus has a binding energy. Therefore, all communities A 
must have pa > p* , corresponding to the edge density 
community definition part (a). Second, 1/(1 — p*) and 
I A = ^nA(riA — 1) are scale factors. In order to mini- 
mize energy (and create the best partition according to 
the APM), we want I a to be as large as possible (larger 
communities), and also pa to be as large as possible. Fur- 
thermore, because of the factor (pa — p*), increasing the 
edge density of the community also results in lower en- 
ergy. As we will see in Sec. VII larger community sizes 



generally tend to imply smaller p, thus larger commu- 
nity size and larger p are competing factors which must 
be balanced. Eq. (14) quantifies that balance. Commu- 



nities with single nodes ("size one" communities) have 
E = according to Eq. Q, and this can be represented 
in Eq. ( 14 1 if we define a community consisting of only 



one node to have p = 1. 



where the inner sum has been rewritten in terms of the 
number of existing edges and the number of missing 
edges I a — 6A- With minor rearrangement and invoking 
our edge density definition p — e/l, we may rewrite the 
energy in terms of edge density with a prefactor of I a, 

E=-J2 1 a(pa--/(1-Pa))- (10) 

A 



B. Variable topology Potts model 

The absolute Potts model [551 13H] has, by now, been 
introduced and studied in earlier works. The variable 
topology Potts model (VTPM), which we now introduce, 
has been hinted at in previous literature, but never de- 
fined nor extensively analyzed 22. 48 . A somewhat sim- 
ilar technique was used in Ref. [47] , where weights were 
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shifted by a value V. The approach of Ref. [17] technique 
requires an adjustment of two variables, V and 7, in or- 
der to find optimal partitions, while the technique of this 
section only requires adjustment of p* . 

The defining property of the variable topology Potts 
model is that it assigns a constant penalty (of size p* ) to 
all links, as opposed to only missing edges. The VTPM 
Hamiltonian is 



E = 



E 1 

a.a'^a 



(A aa t - p*)6{a a ,cr a i). 



(15) 



As in the APM, this can be recast into a form which sums 
over communities first, instead of over all pairs of nodes 



E = -J2 E 



1 



(A aa ,-p*), 



(16) 



A (a,a'^a)£A 



with all variables analogous to the APM nomenclature 
in Eq. Q. We proceed as in the APM to turn this into 
an edge-density based definition using the same relation 
e A — A aa i 



E 



E^ (pa ~ p*} 



(17) 



We, again, have a criteria which must be satisfied for any 
VTPM community to have a binding energy, 



PA> P 



(18) 



which is identical to the criteria of the APM and of edge 
density community definition (a). In this form, we see 
that the VTPM insists that all communities have an edge 
density greater than p* , which very naturally corresponds 
to the edge density community definition. Communities 
are weighted by the number of links I a — \nA{n^ — 1). 

It is illuminating to contrast the VTPM Hamiltonian 
of Eq. (TTtT) with the APM Hamiltonian of Eq. pi}. They 



appear identical, with the exception of the APM's inclu- 



sion of a scale factor of 



i-p* ' 



This means that all past 



work on the APM applies equally to the VTPM. In par- 
ticular, we do not need extensive additional testing on 
the VTPM in order to show that it achieves the same 



performance. The APM scale factor of 



i-p* 



becomes 



negative when p* > 1. In all of our analysis thus far, 
this would not seem to be a limitation, but when we con- 
sider weighted graphs in Sec. |Xj it will. This is the first 
advantage of the VTPM over the APM. 

A "clique" is a group of nodes all mutually connected 
(thus, p = 1). Normally, community detection methods 
do not break up cliques, because they are as strongly con- 
nected as possible. However, if the edges are weighted, 
there may be certain "weak" edges where it is reasonable 
to separate communities. In the APM, weighted cliques 
can not be subdivided, since it only places energy penal- 
ties at missing edges. The VTPM overcomes this lim- 
itation by allowing the least weighted edges to become 
repulsive repulsive first as p* is increased. Cliques can 



then be subdivided at their weakest point. The VTPM 
is so named because it is able to "change the topology" 
of these cliques. 

The core advantages of the VTPM over the APM 
are the ability to break up cliques and handle weighted 
graphs more naturally. Otherwise, it contains the same 
information as the APM. Instead of the control parame- 
ter 7 = ]_—p* > ^ uses P* directly, leading to a much more 
natural interpretation of the community scale. Because 
of this, the VTPM provides a compelling community de- 
tection algorithm for future use, which will be elaborated 
on in future sections. 



C. Discussion of Potts models 

The APM has been extensively studied and shown 
to have acceptable performance in community detection 
across a wide variety of conditions and problems, in- 
cluding common equal-sized and power law distributed 
graphs [39l [46] . Since we have shown that the APM 
directly uses the edge density community definition (al- 
though unstated explicitly until now) , even without addi- 
tional tests, we have strong support for the edge density 
community definition. While the VTPM has not been as 



extensively studied, by comparing Eq. (14) and Eq. (17 1, 



we see that the Hamiltonians are the same for unweighted 
graphs, save a scale factor of This means that the 

VTPM also will be equally powerful in all of the above 
cases. 



D. Energy changes upon community assignment 
perturbation 

The APM and VTPM are sums over all edges provided 
they connect nodes in the same community. Thus, when 
we make some change to the community assignments, 
for example, by combining two communities, the energy 
change is completely represented by the sum of energies 
from edges which were just moved into the same commu- 
nity, minus sum of energies from edges which have been 
removed from the same community. As an example of 
this, consider adding node x to community B, Fig.[l](b). 
The entire APM energy remains unchanged except for 
the Ub links between x and the nodes of B. Thus, the 
energy change can be represented as 



AE 



(A 



bx 



= 1 



beB 

CBx - ^QBa 
PBx - P 

1 - p* 



lB bx ) 

c — &Bx) 



B.i 



(19) 



In the above, A\> x and B\, x correspond to the adjacency 
and inverse adjacency matrices as defined previously, 
while other uses of B within subscripts correspond to 
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the community B in Fig. [T] For the VTPM, the energy 
change upon perturbation is also analogous, 



AE = l Bx (pbx 



(20) 



The energy changes via perturbation have the same form 
as for the global energy, with the sum only over the sub- 
set of edges created, and the subtraction of only a sum 
over the subset of edges removed. The local nature of en- 
ergy makes it fast to compute energy changes under sys- 
tem perturbations. Community detection then becomes 
a fairly well understood problem of sampling an energy 
landscape with local interactions, using dynamics of the 
user's choice. 

Using Eq. ( |19[ ), we can see that a node x will join a 
community B (Fig.[l](b)) if the node-to-community edge 
density is pb x > P* ■ A node e' will leave a community 
E (Fig. [I] (e)) if the node-to-community edge density is 
PEe' < P* ■ We can also show that two communities C, C 
will merge (Fig. [I] (c)) if the inter-community edge den- 
sity is pec > P* ■ Thus, edge density naturally describes 
all possible community merges, with p* being the critical 
edge density for all possible merges or splits. 

The above shows that communities returned by the 
Potts models satisfy (b) and (c) of the edge density com- 
munity definition, Sec. |II A Criteria (c) states that com- 
munity B will have every node Pbx > P* ■ If this was 
not true in a community, energy would be lowered by 
removing node x from the community via the inverse of 
Eqs. ( 19|20 1. Criteria (b) states that a community B 
will grow as long as ps stays greater than p* . This is ev- 
idenced by the fact that any node x with pg x > P* will 
have a favorable energy change upon joining B, growing 
B is large as possible as long as all nodes have sufficient 



node-to-community edge density. Eq. 22 ensures that the 
resulting community has pg > p* . 



IV. MODEL-INDEPENDENT COMMUNITY 
PROPERTIES 

The edge density p is not just an arbitrary variable 
selected because it is simple and leads to a consistent 
definition of community. It has many theoretical prop- 
erties which can be compared to real communities, and 
helps in the formulation of a consistent community de- 
tection framework. In this section, we will derive various 
properties of p which will prove useful in the exploration 
of the edge density community definition. 

Consider a community A with ha members at (Fig. 
[2]). The edge density within A is pa- Each node <n has 
an edge density to the rest of the community p a% . The 
first property which we will show is that the edge density 
of the community is equal to the average edge density of 
the component nodes to the community, 



PA = (pAci) 



(21) 



with the average taken over different community mem- 
bers ai. To show this, we apply the edge density Eq. [I] 




FIG. 2: Illustration of average mean community edge den- 
sity. Community edge density Paj§ is equal to the mean of 
all node edge densities |(| + f + | + | + | + |) = (PA ai ) 
This, and other similar invariants, are among the properties 
of edge density. 



and evaluate the average in Eq. (121]). This leads to 



(PA ai 



-E 



&Aa 

nA ' ' 
a 

&A 

\n A {n A - 1) 



(22) 



In the above, the number of links for each node is con- 
stant at l a = nA — 1, and ^nA(nA — 1) is the number of 
links in a community of ua nodes. 

Next, when a node x joins a community A to form 
community A U x, there is a relationship between the 
edge density of the combined unit paux and that of the 
component edge densities pa and Pax-, 



PAUx 



e AUx 

Ia 



I A + I Ax 



&A + d Ax 

Ia + 1 Ax 

I 

PA 



A:r 



Ia + 1 Ax 



PAx, 



(23) 



which is simply the mean of pa and pax weighted by the 
respective numbers of links I a and Ia x ■ 

A similar relationship can be shown for two non- 
overlapping communities A and B with initial edge densi- 
ties pA and pb and a inter-community edge density pab- 
When two communities merge, the new edge density is 
the average of the edge density of A, the edge density of 
B, and the inter-community edge density pab, weighted 
by the corresponding number of links I a, Ib, and Iab 



Paub 



IaPa + IbPb + IabPab 
Ia + Ib + Iab 



(24) 



These properties are useful for considering dynamics 
of community detection algorithms. For example, for a 
community A and a node x, if we know that x should join 
A (pax > P*)> then the final edge density of A (/9^ Ux )can 
not decrease below p* . Furthermore, if we have two com- 
munities A and B (with pa > p* and ps > p*), if the 
inter-community edge density pab > P* (the criteria for 
community merging), then, after merging, we are guar- 
anteed that the new community AU B must have paub 




FIG. 3: Balance between placing a node x into community 
A or B. In this plot, x has a sufficient edge density to join 
either community, but when overlapping community assign- 
ments are not allowed, we can have x join both. The text 
discusses the criteria that edge density models use to assign 
x to either A or B. 



satisfying the edge density condition paub > p* ■ These 
properties serve as a basic check on the sensibility of our 
community definition. 

When p* = 1, then all communities must be fully con- 
nected (only graph cliques are allowed as communities). 
When p* > 1, communities can not exist as edge density 
can not be greater than one. In this case, most CD meth- 
ods will return "communities" which actually consist of 
single nodes, since there is never a case that multiple 
nodes can join together. When p* = 0, there is no lower 
bound for community size, and all nodes can collapse into 
one large community spanning the system, however, this 
may not happen if the graph consists of disjoint subsets 
of nodes and the exact dynamics of the CD process does 
not attempt to join disjoint sets of nodes together. 



V. MODEL-DEPENDENT COMMUNITY 
PROPERTIES 

The discussion of the properties in the previous section 
makes one critical assumption: when nodes are added 
to communities, they are previously "unassigned," or 
in single-node zero-energy communities. In these cases, 
there is no energy barrier for removing nodes from the 
previous community to which they are bound. This is 
the case when constructing local communities, or when 
overlapping communities are allowed. When this is not 
the case, in order to add a node b to community A, it 
must first be removed from some other community, say 
B. Since the node b is in community B, b must have a 
binding energy to B that must first be overcome. Any 
energy released by moving b to A must first offset the en- 
ergy needed to remove b from B. In order to determine 
trade-offs between larger communities and greater edge 
density, we must use the Hamiltonian from one of our set 
of models. Towards this end, without loss of generality, 
we may use the APM. 

In order to demonstrate the choices which the edge 
density models make, we will use a simple thought ex- 
periment of one node x which can either join community 



A or B, as in Fig. [3] As a precondition , we must have 
PAx > P* and pBx > P* , otherwise x can not join both 
communities. According to the edge density community 
definition, with pa x > P* and pBx > P* > in isolation, 
or if overlapping community assignments were allowed, 
it would be allowable for x to join either community, 
and this would be the preferred energy-minimizing move. 
When overlaps are not allowed, x must choose one of A 
or B to join. 

A similar situation occurs when node b, part of com- 
munity B, has a sufficient edge density to also join A. 
The energy of addition to A must at least compensate 
for the energy of removal of b from B. We imagine this 
in two parts: first, the removal from B, and second, the 
choice between addition to A or B, allowing us to con- 
sider only the A, B,x situation of the previous paragraph 
without loss of generalization. 

When using the VTPM, the node x will choose to join 
the community which most minimizes the energy. It will 
join A when Ea x < Eb x , which can be rearranged to 



nA [PAx - P*} > n B [pbx - P* 



(25) 



We see that x will join the community A or B to which it 
has the largest excess edge density pc x — P* , weighted by 
the community size ha or ub- We note the following: (a) 
we assume p* < 1. The cases where p > 1 are discussed 
above, (b) we assume pax > P* and pbx > P* ■ If both are 
less than p* , x can join neither community, if one is less 
than p* , we see the node will join the other community 
(c) If p* = 0, then x will join the community of greater 
size, (d) If pax = PBx, x will join the community of 
greater size, (e) if tia = tib, according to Eq. (25), x will 
join the community to which it has greater p. 

We see that these results support the idea that a node, 
when faced with other communities of equal sizes, will 
join the community to which it shares the greatest edge 
density, however, there is also a competing preference 
towards smaller, more dense, communities. 

It deserves emphasis that the results from this section 
are derived for one particular instance of the edge density 
community definition, the one derived from the VTPM. 
For unweighted graphs, this model will give equivalent 
results to the APM. The APM has been shown to be an 
effective community detection method in a wide variety 
of situations [38" 1 135 1 146] . 



VI. SIMPLE EDGE DENSITY COMMUNITY 
DETECTION ALGORITHMS 



The purpose of this paper is not to discuss specific edge 
density community detection algorithms, instead, we fo- 
cus on the ground-state properties of this community defi- 
nition. There are various published algorithms which can 
be used directly with edge densities. Previous work from 
our group used a global algorithm, beginning with every 
node in a different community and merging communities 
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until some energy minimum is reached [38, 39, 49, 50 . Al- 
ternatively, there are various local methods, which build 
up a single community around single nodes [5"TH5"3"] . 

Unlike other methods, there is no heuristic for deter- 
mining the proper p* . Instead, we use a multi-replica in- 
ference method where independent "replicas" are solved 
at the same value of p* , and their similarity is compared. 
If, for a given value of p*, the independent replicas min- 
imize to give similar community structures, this is con- 
sidered to be a "good" value of p* . This has proven to be 
a reliable and robust method for community detection. 
For addition information, see Sec. |A 4| 

Further information and considerations about these 
methods is found in Appendix [A] 

VII. EDGE DENSITIES IN REAL NETWORKS 

Our edge density definition assumes that p* is a use- 
ful resolution parameter, which can select for larger or 
smaller communities. We have operated on an intuition 
that a larger p* selects for a smaller, more densely con- 
nected communities, while a smaller p* selects larger, 
less densely connected communities. In this section, we 
will directly consider this assumption. Recently, Yang 
and Leskovec (YL) have taken a variety of real net- 
works, mainly social networks, and rigorously studied 
their properties with respect to size, overlap regions, and 
other parameters [40]. They find that the number of edges 
within a community tends to grow with a power in the 
range (1,2), with observed values of 1.1 and 1.5. For 
some exponent value v, it is thus found 

e cx n v . (26) 

However, maximal number of edges (I in our nomencla- 
ture) grows as 

I = ^n(n- 1) cx n 2 , (27) 
thus we find a scaling relation of edge density of 

p n v 

p= - <x -j otn"- 2 . (28) 
I n z 

As long as v E (1,2), we find p decreases as n increases. 
For example, YL observed social networks to have an 
exponent value of v ~ 1.5, giving us 

potn-- 5 . (29) 

As we see, this validates one of the central tenets of the 
edge density community definition: as communities grow 
larger, the edge density tends to decrease. By specifying 
a p* , we implicitly specify a size scale of community which 
we will then detect. Another way to view this is from 
the standpoint of an agglomorative community detection 
alogrithm. Starting from a dense core, as each additional 
node is added to a given community, the community edge 



density on average decreases with each additional node, 
lowering the edge density. As we keep adding nodes to 
the community, eventually, the community will grow so 
large that adding extra nodes will decrease p below p* . 



VIII. OVERLAPPING COMMUNITIES 

Many networks have community structure which can 
most naturally be described as overlapping. In this view- 
point, there are certain nodes which can be reasonably in- 
cluded in multiple groups. Perhaps the most standard ex- 
ample of this situation is social networks: any one person 
will be involved in groups corresponding to work, family, 
hobbies, etc. The overlaps can consist of single nodes 
or larger subsets. Not all community detection meth- 
ods can be extended to handle overlapping nodes. For 
example, the Newman betweenness algorithm progres- 
sively cuts edges until modularity maximization states 
that final communities are found 13 . Because each cut 
is final, there is no ability to create overlapping com- 
munities. Different methods, such as clique percolation 
[7J, attempt to detect overlapping community structure. 
There are a variety of local community definitions, in- 
cluding the community-centric Potts model formulations 
of Eq. ([8| , can detect overlapping communities by virtue 
of independently detecting each community [51] . 

The work of YL also looked at the characteristic of 
overlapping regions of communities [40 . According to 
the behavior seen by YL, regions of overlap between 
communities have edge density contributions from both 
communities [40, 511 155] . Thus, these overlap regions have 
a greater density than the individual communities. It 
had previously been assumed that these regions of over- 
lapping communities had a smaller edge density than ei- 
ther of the non-overlapping regions, and past methods 
of community detection make the opposite assumption, 
and thus not suitable for community detection with the 
observed behavior [7j [56l [57] . We can show directly that 
the edge density community definition properly handles 
this case. 

Let us look at a diagram of two overlapping commu- 
nities A and £?, embedded in a universe of nodes (Fig|4j 
upper). We denote the universe of nodes U, the two com- 
munities A and B, the region of community A excluding 
B as A — B and vice versa, the overlap region A D B, 
and the universe of nodes excluding either community, 
U — A — B. Fig. [4] is an schematic of this situation. Com- 
munities A and B both have internal edge densities of .5. 
The overlapping region An B has an edge density of .75. 
The edge density between A — B and A n B is .5, and 
vice versa for B and A. 

With p* = .75, only the overlapping region An B will 
be detected, with p* — .5, communities will expand to 
cover the region A or the region B, but not AU B. The 
reason for this is that, if we have currently detected the 
community A, any one one node x in B — A will have a 
p = .5 edge density to A(lB, but only p = .5 edge density 
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FIG. 4: Schematic of two communities A and B with pa = 
Pb = -5, and their overlap edge densities. This illustrates 
the affiliation graph model edge densities between each of the 
distinct regions of the figures, with a "dense overlap" in the 
AnB region, as determined by YL [40]. According to YL, 
currently existing community definition models can not detect 
these communities. The table indicates the probability for an 
edge being present between any two regions of the plot: the 
"— " operator indicates set difference, A — B indicates the 
region of A excluding B. 



to A- 
entire set A is pax 



B, thus the average connection probability to the 

.05n A -B + -5riAnB , r „ ^ „* 

■ riA < -5 ' * US ' PAx < P 

.5. This is the desired behavior. The same will be true 
of a node y in A — B attempting to join community B. 
Thus, by applying the edge density community definition, 
we can get exactly the desired behavior: for p* = .5, we 
detect A and B, while for p* = .75, we detect An B. 



IX. THE AFFILIATION GRAPH MODEL 

In order to model the properties they observed, YL de- 
scribed the "Affiliation Graph Model" (AGM) [40 . This 
model is similar to stochastic block models in that edges 
are drawn in only based on the community memberships 
of the two nodes. However, nodes are allowed to be- 
long to multiple communities, and the communities are 
not restricted to being disjoint. This can be interpreted 
as a bipartite graph from "people" to shared "affilia- 
tions" . In the AGM itself, the affiliations are not present, 
and instead each shared affiliation incorporates chance of 
shared edge between "people" (nodes in the graph). A 
similar concept of affiliations grouping has been consid- 
ered before in a sociological and network context [551 158j . 
and such work has hinted that under certain generation 
processes, such models could produce power-law distribu- 
tion of node degrees [301 • Because this benchmark model 
incorporates dense overlap, YL claim that current com- 
munity detection methods are not able to successfully 
detect communities in this type of graph. 

The basic idea behind AGMs is that communities 
("affiliations") are chosen from a universe of nodes. 
These could be non-overlapping and spanning the uni- 
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FIG. 5: Schematic of stochastic block modes (SBM) vs af- 
filiation grapm model (AGM) graphs, (a) The SBM graphs 
are non-overlapping, while (b) the AGM graphs allow over- 
laps. Note that in our specific instances of AGM graphs, every 
node is it at least one community. 



verse (yielding a planted l-partition, or Stochastic Block 
Model), or any assortment of hierarchical, overlapping, 
subset-containing, or other arrangements. Then, we add 
an edge with probability p for each shared affiliation. 
Since there can be multiple shared communities per node, 
we allow each shared affiliation a chance to produce an 
edge. This gives us an ultimate edge probability between 
nodes x and y of 



p xy = 1 n^ 1 

c 



PC) 



(30) 



where the product is over all communities C which con- 
tains both nodes x and y. This is the complement of the 
probability that none of the shared affiliations produce a 
community. 

The instance of AGM networks we study here is con- 
structed as such: A universe of N nodes is taken. We 
take q communities of n nodes each, with nq > N, in 
two steps: first, we initialize the communities with non- 
overlapping communities of N/q < n nodes. Then, for 
each community, we add additional nodes necessary to 
make n nodes per community by randomly choosing from 
all other nodes. There are no restrictions to the maxi- 
mum number of communities to which a node can be- 
long. The edge probability between any two nodes is 



given by Eq. (30). Our particular AGM graphs can be 
identified by the parameters (q,n,p). The AGM bench- 
mark graphs differ from stochastic block model graphs 
by allowing overlaps, Fig. [5] 

Further, we may produce multi-layer benchmark 
graphs. It is traditional to produce hierarchical graphs 
for consideration, where small communities are contained 
within single large communities. To create our multi- 
layer graphs, we create AGM graphs independently over 
the same universe of nodes, and then merge the set of 
edges. There is no requirement that each small commu- 
nity be contained within a single large community, and in 
fact, for each community, each small community is likely 
to overlap with every large community. The amount of 
overlap present here is unparalleled in the existing lit- 
erature. Fig. [6] illustrates the difference between hi- 
erarchical and multi-layer community assignments. To 
produce multi-layer AGMs, we take two AGMs inde- 
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FIG. 6: Schematic of hierarchical vs multi-layer, (a) The 
hierarchical graphs have small communities (smaller circles, 
green) contained entirely within large communities, (b) The 
multi-layer graphs have small communities which are allowed 
to span multiple large communities. In our specific instances 
of AGM graphs, every node is it at least one community, and 
small communities are assigned completely independently of 
large communities, allowing extremely high degrees of mixing 
between small and large communities. There have been no 
proposals of this type of multi-layer community detection in 
the literature, but we will demonstrate our methods detecting 
these communities properly. 



pendently from the universe of nodes and overlay edges. 
Edge probabilities are still calculated via Eq. p0| , but 
now the sum over communities C contains multiple lay- 
ers of communities. We denote our multi-layer AGMs 
via the nomenclature ((qi,ni,pi), (92,^2,^2))- Our final 
graphs are thus a combination of the overlaps of Fig. [5] 
and multi-layer character of Fig. [6] 

We now examine actual results from application of 
the absolute Potts model and variable topology Potts 
model to our overlapping community benchmarks using 
the multi-replica inference framework of Sec. |A 4| We 
use the F-score measures of Appendix [C] to judge the 
performance of our algorithms. We study both inter- 
replica F{ R , which is used to identify the correct p* , and 

, a measure of how well we detected the planted com- 
munities. According to the canonical multiresolution al- 
gorithm, extrema of uniformity between replicas (F( R ) 
indicate p* (or 7) values likely to be significant. Ff in- 
dicates how well we detect the planted communities. In 
practice, we want a maximum of F{ to correspond to 
the maximum of F®. When both of these measures to 
peak at the same value of p* to indicate that we can re- 
cover the planted states and that we would be able to 
infer the correct value of p* if the planted states were 
not known a priori. 

Fig. [7] shows results for single-layer AGM graphs. We 
see that we can perfectly recover communities for both 
situations, and are able to infer the correct value of p* 
(or APM 7) without a priori knowledge. We see that 
we are not only able to detect the planted communi- 
ties (planted Fi = 1), but to know where that would 
be (inter-replica F% = 1). Fig. [8] shows results from ap- 
plication to multi-layer AGM graphs. We see accuracy 
comparable to single-layer graphs, with slightly less well 
defined peaks. It should be noted, we can not solve this 
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FIG. 7: Community detection results on single-layer AGM 
graph with parameters number of communities q = 20, num- 
ber of nodes per community n = 100, and edge probability 
p — .25 in a total universe of N — 1000 nodes, showing perfect 
community detection. Fi indicates our ability to detect the 
known community structure; a value of unity indicates per- 
fect community detection. F( R indicates our ability to locate 
the correct resolutions (value of p* ) without a priori informa- 
tion about community sizes. When both measures becoming 
unity at the same p* value, our methods can both infer the 
correct resolutions and achieves perfect detection at that res- 
olution. Similar results are achievable with a wide range of 
AGM parameters, demonstrating robustness of the methods. 
Upper plots show community detection with respect to n mean , 
showing that the methods do indeed detect the correct com- 
munity size n = 100. The column (a) uses the absolute Potts 
model, and the column (b) uses the variable topology Potts 
models. The lower plots show the role of the respective scale 
parameter 7 for the APM and p* for the VTPM. 



for arbitrary parameters. The (q,n,p) of the two lay- 
ers are hand-chosen to give good results by adjust the 
probabilities of large and small communities. Because of 
this, the methods here would not necessarily be useful 
for real graphs. We choose this example to demonstrate 
the power and limitations of the method. It is unlikely 
that real graphs would be as complex to solve as the 
multi-layer fully independent random case. 

In a future work, we will systematically apply edge den- 
sity methods to a wider variety of AGMs, demonstrating 
the usefulness of edge density methods. The models we 
will consider will include various combinations of power 
law community sizes, different forms of hierarchy, com- 
munities within communities, homeless nodes, and more. 
In addition, we will provide theory information regard- 
ing the limits of edge density methods in the face of large 
amounts of dense overlap. 



X. WEIGHTED GRAPHS 

One area of modern community detection research is 
the subject of weighted graphs. In weighted graphs, not 
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FIG. 8: Community detection results on multi-layer AGM 
graphs. 

Multi-layer AGM graphs with parameters (a) ((51 = 16, ni = 
100, pi = .3),(g 2 = 64, n 2 = 25, p 2 = .75)) and (b) ((gj = 
16, ni = 100, pi = .35), (q 2 = 64, n 2 = 25, p 2 = .75)). Both 
inter-replica and known detection measures are plotted as in 
Fig. [7] Note that these parameters are "cherry-picked" to give 
good results; successful community detection is not possible 
for arbitrary parameters. At arbitrary parameters, one layer 
is perfectly detectable and one layer is noticeable but without 
perfect detection. Upper/lower rows and left/right columns 
maintain the same interpretation as in Fig[7] 



every edge has equal importance. Any modern commu- 
nity detection method should be able to handle weighted 
graphs. A high weight will indicate an edge which plays 
a major role in the graph, while a low weight indicates 
an edge which does not significantly affect the graph. An 
unweighted graph can be considered a weighted graph of 
edges of weight one, so by convention, a weight of zero 
corresponds to an edge which is not present, and a weight 
of one corresponds to an unweighted edge. 

In order to adapt edge density to weighted graphs, we 
make a simple substitution. We consider the number of 
edges to be the sum of weights of all edges i, 



(31) 



with the sum taken over all edges. Thus, the edge density 
for any grouping of nodes X is 



Px 



Em 



(32) 



The grouping X can be that of any community A, a pair 
of communities AB, or any of the other situations de- 
picted in Fig. [I] As to be expected for any reasonable 
extension of its counterpart of the unweighted graphs, 
we note that, indeed, for unweighted graphs Eq. ( [32] ) 
reduces to the same definition appearing in Eq. ([!]). Fur- 
thermore, edges can be weighted greater than one for 
very important edges. Adversarial edges (edges which fa- 
vor being in different communities) can be weighted less 



than zero. All of our analysis in the previous sections 
remains valid mutatis mutandis, with the caveat that p 
is no longer limited to the range [0, 1]. The edge density 
can exceed unity when there are many edges with weight 
greater than one, or less than zero if there are enough 
adversarial edges. Furthermore, we maintain a property 
of linearity of edge densities. If linearity is not desired, 
a power (or other mapping function) could be applied 
to edge densities before summing to get the "number of 
edges" stand-in. 

The APM Hamiltonian, Eq. ([8JI, can handle weighted 
graphs, with A aa > being a matrix of edge weights for 
present edges and B aa i = 1 if there is no edge. There 
are two caveats. First, the APM model uses the number 
of missing edges, for which we use I - e in Eq. ^ (the 
maximal number of possible edges, minus the number of 
actual edges). However, when e = ^ Wi, this is no longer 
necessarily true: e no longer is identical with the number 
of existing edges. Thus, Eq. (14) no longer corresponds 
to Eqs. m Second, in our final density formulation 
of the APM Hamiltonian (Eq. 14), there is a factor of 



l p , . This becomes zero or negative when p* > 1, thus 
rendering our derivations invalid. 

The VTPM avoids both of these limitations, and al- 
lows a natural extension to weighted graphs with no dis- 
continuity for p < or p > 1. We reiterate that as we 
have shown that the APM and VTPM are identical for 
unweighted graphs, we know, even without dedicated ex- 
perimentation, that the VTPM is a successful community 
detection method for a broader class of graphs. 



XI. EXTENSIONS 

Multigraphs are graphs that allow more than one edge 
to be between any pair of nodes. As we sum over all 
edges, multigraphs can be very naturally included in the 
summation over edges. However, note that under this for- 
mulation, a multigraph is seen as equivalent to a weighted 
graph, with each edge having a weight equal to the sum of 
the weights of the other edges. If this procedure loses es- 
sential information about the graph, a different method 
will be needed. Perhaps multiple edges could be com- 
bined into one with a different weighting function, how- 
ever, without a rigorous analysis of the most important 
aspects of multigraphs, we can only speculate. For now, 
all we do is point out that edge density is not incompat- 
ible with the concept of multigraphs. 

If our graphs have multiple types of edges, or multiple 
distinct forms of weight per edge Wo, W\, . . . generalized 
to a vector w, we can generalize the total edge weight 
as a weighted average w = w • u, with u being a unit 
vector weighting the different individual weight contri- 
butions. This allows us to use a different combination of 
component weights depending on our intended commu- 
nity detection goals. 

The methods presented in this paper use only edges, 
and not higher order correlations such as triangles (3- 
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cliques). In order to use e.g. triangles, we would define 
a "triangle density" as 



Po 



to 



\n c {n c - l)(nc - 2) 



(33) 



with tc being the number of triangles in community C, 
nc being the number of nodes in community CC, and the 
denominator being the maximal possible number of trian- 
gles in community C. This could be extended to a num- 
ber of other higher-order correlations of any structural 
motif rather trivially. Many real networks have been 
shown to contain non-trivial patterns in these higher or- 
der correlations, which are for the most part not presently 
considered in community detection methods [59] . 



XII. DISCUSSION 

Despite edge density well describing many benchmarks 
and real graphs in an effective manner, our definitions 
and protocols have certain limitations. While these limi- 
tations exist, past use of edge density methods has shown 
they are not barriers for most important problems. The 
edge density community definition is simple. Thus, we 
expect the edge density community definition to be gen- 
eral and easily extendable. 

For large sparse networks, the total number of edges is 
proportional to the number of nodes (e oc cn, c = 0(1)), 
while the number of possible edges is / oc p\n(n— 1). The 
edge densities are driven to zero exists as the number of 
nodes per community n increases, 



pcx - oc 

I p^n(n 



1) 



0. 



pn 



(34) 



When all edge densities become small, it become progres- 
sively more difficult for the parameter p* to distinguish 
communities. This general problem is discussed in more 
general terms in a companion work |60j . Overall and 
inter-community edge density decreases with N (total 
number of nodes), while infra-community edge density 
decreases with n, the number of nodes per community 
However, in the case of somewhat fixed-sized communi- 
ties, the inter-community edge density is driven low much 
faster than infra-community edge density, allowing p to 
still be successfully used to distinguish the communities. 
The success of edge density community definition will 
depend on the precise graph and "ground-truth" com- 
munity characteristics, and can be studied over various 
classes of graphs. 

Studies of real networks have shown a power law dis- 
tribution of community sizes 1611 162| . A variation of 
community size does not affect the edge density commu- 
nity definition, as long as the community has a uniform 
p of each edge existing, the edge density community def- 
inition will be able to properly detect its communities. 

Studies have also shown that real networks have power- 
law distributions of node degrees [551 I53T[55] . If, upon 



further analysis, this power law degree distribution cor- 
responds to a power law distribution of internal degrees 
(number of edges connecting to other nodes inside a com- 
munity) , then the distribution of node (a) to community 
(A) densities pAa — ^AajinA ~ 1) will then be power law 
distributed as well. The edge density community defini- 
tion can still be relevant to these graphs. In these cases, 
p* will determine the edge density for the lowest-degree 
nodes of the community. The high-degree nodes will still 
be included as part of the community to which they have 
the greatest connecting edge density. If a high-degree 
node has many of its edges spread out among other com- 
munities, it will not be misclassified as long as its node- 
to-correct-community edge density is greater than p* and 
it shares the greatest edge density with the correct com- 
munity. The work of Lattanzi and Sivakuma |66) provides 
another possibility which will be investigated in a future 
work. Using a model similar to AGMs, they have shown 
a natural appearance of power law behavior in affiliation 
graphs. If this holds for AGMs, then power law node de- 
grees may be consistent with constant internal densities 
- rendering our current work indeed relevant. 



According to the simple definitions listed here, every 
node must have an edge density of greater than p* to its 
community. This appears to be a fairly heavy restriction: 
if there is a node with less than p* edge density to any 
other community, it will be forever alone. One method 
of working around this would be to then allow isolated 
nodes to join whatever community they have the greatest 
edge density connection with. Alternative schemes could 
be developed, where only the average community edge 
density must be greater than p* , and certain individual 
nodes can have an edge density of less than p* . Regard- 
less, internal edge density is only capable of detecting 
assortative communities, where nodes are connected to 
similar nodes. 



In reality, no single community definition is expected 
to work across all classes of graphs. Edge densities may 
describe graphs arising from a certain generation pro- 
cess (such as shared affiliation social networks), informa- 
tion flows may describe data-centric networks, and other 
methods may describe scientific citation networks, among 
many possibilities. Certain community definitions should 
not be studied at the exclusion of others, and it is im- 
portant to understand all community definitions to know 
the realm of their applicability. Furthermore, rigorously 
understanding the resulting communities from a variety 
of community detection algorithms provides a tantalizing 
inverse-community detection possibility. Suppose we had 
a graph which an unknown complex structure, but we did 
know ground-truth communities. By applying commu- 
nity detections algorithms to this graph, and comparing 
the returned communities to the planted communities, 
we may learn something about the graph generation pro- 
cess itself. 
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XIII. AVOIDING OVER-OPTIMIZATION 



As stated in Sec. II B[ the edge density variable p is 
directly analogous to the probability p used to generate 
many common benchmarks. Since our detection method 
exactly matches the creation processes of common bench- 
mark graphs, it is easy to understand why we achieve 
such accurate detection. Our definition will be useful for 
the cases where real graphs are generated in an analo- 
gous manner. A current topic of research is the processes 
which generate various real-world networks; for example, 
processes have been proposed giving rise to power law 
degree distributions, and also with constant edge prob- 
ability, as in AGMs. Once more is known about these 
processes, we can better understand what community 
definitions are optimal. In addition, there is no reason 
to believe that one community definition will be optimal 
in all types of graphs, and we must have many available 
techniques in our toolbox to be able to respond to what- 
ever problems may occur. 

Nevertheless, many current benchmark graphs are gen- 
erated using edge densities, and can be described as such. 
Our companion work expands on this, and uses edge den- 
sity - the lowest common denominator of many meth- 
ods, to learn about the structural source of community 
detection limits in stochastic block model graphs. Fur- 
thermore, due to the highly symmetric nature of these 
equal-sized stochastic block model graphs, our edge den- 
sity results generalize to almost all community detection 
methods. 



XIV. CONCLUSIONS 

Edge density community detection methods have been 
used before, but despite this fact, the underlying the- 
ory of such methods have not been explicitly defined and 
fully explored. This work fills this gap, and in the pro- 
cess formally expands the edge density community defi- 
nition to include important new classes of graphs, such 
as weighted graphs, and the possibility for overlapping 
communities. One of the most important features of our 
edge density community definition is that it simple. One 
equation, p = e/l (Eq. Q), is all that is needed to ex- 
press the core concept, yet our definitions and methods 
apply within communities, between communities, and for 
specific nodes. This simplicity is directly linked to the 
generality of the definition. 

To make our formulation concrete, we first discussed 
an existing edge density community definition, the abso- 
lute Potts model. Methods based on Potts models have 
historically proven to be very accurate, but have shown 
limitations for weighted graphs. To work around this 
limitation, we proposed a new edge density model, the 
variable topology Potts model, and shown its equivalence 
to the absolute Potts model for unweighted graphs. This 
generalization points the way for community detection in 
all major types of graphs. It is worth emphasizing that 



our edge density is a local community definition, where 
nodes and communities are only affected by their near- 
est neighbors. This means our algorithm can be easily 
scaled to large data sets, where only a small portion of 
the graph is discoverable. 

The core of this work involved developing criteria 
which must be satisfied for any community assignment 
to be optimal. The criteria are developed with respect 
to various changes in community structure (community 
merging, addition of a node to a community, etc.) be- 
ing energetically favorable. This is an important part of 
the evaluation of any community definition or algorithm 
since it provides intuition as to how the algorithms ap- 
ply to real world graphs. Furthermore, by developing a 
set of criteria for various community characteristics, one 
can check that no unrealistic properties emerge. A chief 
example of such an unrealistic property, and an example 
of the application of this technique, was Fortunato and 
Barthelemy's investigation of a community merge crite- 
ria within the modularity community definition in order 
to find that there was a resolution limit |17j. 

We have applied our edge density methods to the 
AGM, a recently proposed benchmark graph model [40]. 
The creators of this benchmark model claim that no cur- 
rently existing method can solve it, though it is likely 
that there exist methods other than edge density which 
also can. Wc have solved it exactly, and demonstrated 
why we are able to do so. Our analysis of the AGM 
hints at the underlying reason for the accuracy of the 
edge density community definition. We postulate that 
the reason the edge density community definition per- 
forms so well on benchmark graphs is related to the 
fact that many benchmark graphs are created with an 
edge probability p as the independent variable, to which 
our p* is analogous. Nevertheless, edge density meth- 
ods have proven to be valuable when applied to real 
world networks |371 [5U] and the latest standard bench- 
mark graphs (LFR, AGM) gOl S3 E7J ■ This provides evi- 
dence that edge densities, and the properties derived sub- 
sequently, do in fact reflect characteristics of important 
real world networks. 
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Appendix A: Simple edge density community 
detection algorithms 

In the past few sections, we have outlined the ingredi- 
ents necessary for a community definition based on edge 
density. We will now outline several simple procedures 
for applying this definition to an algorithm. These dy- 
namical procedures follow the precedent set by existing 
literature, and our ability to use our definition, coupled 
with multiple forms of dynamics, illustrates the sepa- 
ration between community definitions and dynamics of 
community partitioning. 



1. Global method of Ronhovde, Hu, and Nussinov 
(RHN) 

This method takes an entire graph, and partitions it 
together, resulting in a single community assignment for 
every node [Ml EM SSI [5U]. RHN use a global p* (in 
the form of absolute Potts model 7) and demonstrate an 
adaption to overlapping nodes. 

In the RHN approach, we assign every node to a unique 
community consisting of a single node as there are the 
same number of initial communities as there are nodes. 
Then, we make repeated passes of the following changes 
in community assignments until we reach a point of local 
stability: one in which none of the following moves will 
lower energy any further. More explicitly, the moves are: 

• Local shifts. Choose one node, and change the com- 
munity assignment of the node to another already- 
existing community. Accept if the change lowers 
the energy. If the previous community only con- 
sisted of one node, that community vanishes and 
our number of communities q shrinks by one. 

• New communities. Choose one node in a commu- 
nity of more than one node, and attempt to move it 
into a completely new community (which will then 
have only one node in it). Accept if the change 
lowers the energy. This increases the number of 
existing communities by one. 

• Merges. Attempt to merge two existing communi- 
ties. Accept if the change lowers the energy. This 
move contracts the number of communities by one. 

The entire program outline above is a steepest-descent 
algorithm. In order to get over energy barriers, we per- 
form t independent trials with different random seeds 
(for either in initial community assignments, or orders of 
traversing nodes and communities in trials) as a means 
of gaining improved sampling. Typically, on the order of 
5 trials are needed, and this is found to be much more 
efficient of computer resources than simulated annealing- 
like algorithms [Ml HE HE ED] . 

RHN also created an extension of this approach to 
overlapping communities |47j . After performing the above 



steepest descent non-overlapping algorithm, they take 
the following until they achieve a locally stable config- 
uration: 

• Overlap expansion. For each community, attempt 
to add each node not currently in that community. 
Do not remove the node from the previous commu- 
nity. The number of communities stays constant, 
but one community gains an extra node. Accept 
any additions which lower energy. 

• Overlap contraction. For each community and for 
each node in that community which was not among 
the original nodes pre-expansion, attempt to re- 
move that node from the community. Accept any 
removals which lower energy. 

This algorithm has been shown to have exceptionally 
good performance and efficiency. A disadvantage of the 
original method is that it uses a global p* ; more recently 
Ronhovde and Nussinov extended this approach to infer- 
ring local structure [57] . 

2. Local algorithm of Lancichinetti, Fortunato, and 
Kertesz (LFK) 

In 2008, LFK developed a local fitness function and 
corresponding dynamics for local community detection 
algorithms [5TM53| . This function shares some similarity 
with edge density, although is is distinct in using exter- 
nal links as part of the measure of local fitness. Their 
dynamics can be easily adopted to our Potts models for 
edge density. 

For this method, we choose a starting node from which 
to base our community designations. We then loop the 
following steps: 

• For each community, take the set of all nodes adja- 
cent to, but not within, the community. Calculate 
the change in community fitness if each node were 
to be added to the community. Add only the node 
which increases the fitness function by the most. 
Repeat until no further additions can increase the 
fitness. 

• For each community, calculate the change in fitness 
if each node individually were to be removed. If any 
node would increase fitness by its removal, remove 
the one node which most increases fitness. Repeat 
until no further nodes can increase fitness by their 
removal. 

• Once a locally stable community is found, repeat 
the procedure starting from another node which has 
not yet been assigned to any community. 

This procedure allows overlapping communities to be 
found, and allows a locally tunable p* , as opposed to a 
global p* . This method has not been applied to our edge 
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density community definition, but can easily be via lo- 
cal optimization of Eq. (14). This algorithm provides a 



method of community detection when only a small por- 
tion of the graph is visible. 



3. Advanced methods 

Methods such as simulated annealing or heat bath al- 
gorithms are extensions to the above methods [22j [49] . 
As the edge density community definition is based on a 
Hamiltonian and not a set of dynamic steps, we have 
the freedom to choose any dynamical steps we may like 
to minimize energies. Energy optimization has been 
extensively studied in the physics literature, and there 
are many lessons which can be taken from spin glasses, 
molecular dynamics, and other fields. 

It would be useful to study the ability of various meth- 
ods to overcome local energy barriers. For example, RHN 
noticed that community merge moves were important 
in order to surmount local barriers|68j. Without these, 
singe-node community shifts were not able to effectively 
lower energies. There are subtle differences between the 
order of additions and removals of the RHN and the LFK 
methods, which could have impact on the performances 
of the minimizations. These methods will not be dis- 
cussed further here. 



4. Multi-replica inference 

In order to use the methods outlined above, we must 
know p* before we begin our detection process. Since 
this, in general, can not be known in advance, we need 
to infer p* via some technique. There is an estab- 
lished procedure for this multi-resolution analysis in the 
literature [39], 

To do this, we scan across a range of values and per- 
form multiple community detections ("r replicas") at 
each p* . We infer that if the results from community 
detections at a given p* are very similar, we have good 
community detection. This is equivalent to saying that 
at good values of p* , we have one dominant community 
assignment that is uniformly detected. Past work has 
shown this to be a very good procedure [351 ESI 1SH] ■ 

There are various measures of inter-replica similarity, 
most based on the concepts of information theory [70HT2] . 
Various proposed choices include the variance of infor- 
mation (VI) 38, 731(71], normalized mutual information 
f/v) [381175] . and a generalized normalized mutual infor- 
mation capable of handling overlaps (N) [5T]. In this 
work, we will use a version of the F-score, F± general- 
ized to handle partitions (see Appendix [C]) [SHI [S3] . The 
unifying characteristic of these measures is that they take 
two complete community assignments and output a num- 
ber which indicates the similarity of the partitions. Most 
measures, including the F-score, are normalized as fol- 
lows: a value of unity indicates perfect agreement in par- 



titions, while a value of zero indicates completely decor- 
related community assignments. 

Our F-score development extends the concept of com- 
paring two partitions of the system to overlapping nodes, 
similar to, but conceptually simpler than, the develop- 
ments of Lancichinetti et. al.\5l\. Our derivation pro- 
vides more insight into the actual performance of the 
community detection algorithm. Full details are located 
in Appendix [C| There, we derive a modification of Fi for 
comparing entire partitions (instead of single communi- 
ties) to each other, which we denote F R . F^ has the 
following interpretation: a value of unity indicates per- 
fect agreement between community assignments, and a 
value approaching zero indicates perfect decorrelation of 
community structure. 

These similarity measures (VI, In, F^ , etc.) are also 
used for testing the outcomes of community detection 
experiments. If we know the correct community assign- 
ments (community assignment Ro), we expect the simi- 
larity between the detected and correct communities to 
be unity when averaged across the detected configura- 
tions i?2, • ■ • , R r '- 
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using similarity measure S. To compute the inter-replica 
symmetry, we use 
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Thus we can state our general criteria for a useful 
multi-resolution algorithm: as a function of p* , we must 
have a maximum of S IR at the same locations as S° unity, 
in order to have both accurate community detection and 
the ability to infer correct p* with no prior information. 



Appendix B: Why do current methods not properly 
handle dense overlaps? 

One of the claims of YL is that current community 
detection methods do not properly handle dense overlaps 
as in Fig. [4] 40]. In this section, we will explain why 
that is for certain popular methods. For our thought 
experiment, consider two communities A and B, with a 
dense overlap as shown in that figure. 



1. Clique percolation 

In clique percolation, a n-clique (set of n nodes all 
mutually connected) is located within a graph[7]. Then, 
all cliques which overlap n — 1 nodes of the first clique 
are identified. This process continues, and a community 
is defined as set of all nodes reachable by this percolation 
process. For both communities A and B, the percolating 
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clique will enter the overlap and detect these nodes. Once 
that happens, the percolating clique can "jump" to the 
other community, and they will merge together. 



2. Betweenness 

When using betweenness to detect communities, short- 
est paths (along graph edges) are drawn between all pairs 
of nodes [13]. The edges with the most shortest paths 
falling through them are considered the boundary be- 
tween communities, and are virtually removed, in se- 
quence, until isolated communities remain. An exter- 
nal criteria (such as modularity) is used to determine the 
stopping point of this process. In the dense overlap view- 
point, there is no region with fewer edges, which means 
that edges at a community boundary never have many 
paths focused through them. Betweenness is thus unable 
to determine community boundaries. 



3. Other algorithms 

Thought experiments similar to the above can be per- 
formed for other community detection algorithms. While 
YL claim that existing community detection algorithms 
can not handle dense overlaps, certain methods can do so. 
As already explained above, the Absolute Potts Model is 
able to detect sparse overlaps[38]. Furthermore, other 
various local optimization algorithms are able to detect 
communities in the face of dense overlaps [5TJ [55]. 
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Appendix C: F-score partition similarity metric 

In order to be able to quantify the performance of 
community detection algorithms, we must have tools to 
compare the similarity two partitions (or more generally, 
covers) which allow overlaps). Furthermore, one of the 
core tenants of our multi-resolution algorithm is that high 
uniformity across replicas indicates good community de- 
tection solutions. There are a variety of these functions 
derived from information theory which answer the ques- 
tion "if you know one partition of the system, what is 
known about replicas?" . 

This F-score development discussed here has certain 
advantages. First, it can handle overlapping communi- 
ties. It also can handle incomplete partitions, where the 
communities being detected or sought consist of only a 
portion of the nodes of the entire graph, as opposed to 
other functions which only compare complete partitions 
of the graph. Thus, the F-score is a valuable tool for local 
community detection study. The F-score has the inter- 
pretation: F = 1 implies that we have exactly recovered 
a known community, with every node detected and no 
extraneous nodes detected. 



FIG. 9: Example for calculation of F-score. (a) We have a 
known community A (green nodes), and the results from the 
community detection algorithm A' overlaid (dashed oval), (b) 
Calculation of precision: 7/10 detected nodes are correct, (c) 
Calculation of recall: 7/12 correct nodes are detected. 



1. Single-community F-score 

As an example, let us consider the situation depicted 
in Fig. [9] We have a community A which we want to 
detect. We apply some algorithm and end up with a 
group of nodes A' (dark and green). We see there are 
10 nodes in A, \A\ — 10. Our algorithm has returned 
12 nodes, \A'\ = 12. Note that we have an overlap of 
seven nodes, which means five nodes were detected which 
are incorrect, and we missed three nodes we should have 
detected. 

We will consider our initial development to be for a 
single community (in other contexts, F-score is defined 
only for single-group searching). Using some local com- 
munity detection algorithm, we detect a group of nodes. 
We use the phrase "community" to indicate the known 
group of nodes we want to match. We use the term "de- 
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tected nodes" to indicate what the community detection 
algorithm actually returns. Note that there are few pub- 
lished local community detection algorithms of this type 
(that will return only a single community in isolation, as 
opposed to a partition/cover of the entire system). This 
is not a practical limitation, as we develop methods of 
averaging to compensate for this. 

There are two components to the F-score. First is the 
precision, measuring how many nodes in the test commu- 
nity actually belong in the known community. It answers 
the question "of all nodes detected, how many of them 
are relevant to understanding the characteristic of the 
community" . Precision is defined 



precision^, A') 



correct nodes returned \A fl A'\ 



total nodes returned 



\A>\ 



(CI) 

The nomenclature \A\ indicates the number of nodes in 
A, and n indicates set intersection. In Fig. [9j precision= 
7/10. A precision of 1 indicates that every node detected 
is in the community. A precision of zero means that no 
detected nodes are in the community. A lower precision 
indicates more false positives. 

Next, the recall indicates what fraction of the commu- 
nity nodes were actually detected. It answers the ques- 
tion "of all nodes in the community (that we want to 
detect), what fraction were detected?" Recall is defined 
as 



recall(A, A') 



correct nodes returned \A D A'\ 



total community nodes 



\A\ 



(C2) 

In Fig. [9j recall= 7/12. A recall of 1 indicates that we 
have managed to not miss any nodes in the desired com- 
munity (though there could be excess nodes returned, 
too). A recall of less than one indicates false negatives. 

Generally, there is a trade-off between precision and 
recall. Precision can be made 1 by selecting fewer nodes 
(in the limiting case, by selecting only one node which 
is known to be in the community), at the cost of a very 
low recall. Conversely, recall can be made 1 by selecting 
every node in the graph, at the expense of a low precision. 
Overall goodness of our search process is measured in 
the form of the F-score, the weighted harmonic mean of 
precision and recall 



F P (A,A') = (1 + I3 2 ) 



2 prec(A, A')recall(A, A') 



3 2 prec(A, A') + recall(^, A') ' 



The parameter f3 weights the relative importance of pre- 
cision and recall, with a higher j3 weighting recall more. 
This selectivity has a benefit in community detection 
work: if the user has a preference for ensuring detec- 
tion of all nodes at a possible cost of false positives, or 
the converse, that can be accommodated. 

In our work, we will use Fx, the balanced score. Fx is 
the harmonic mean of precision and recall, 



F 1 (A,A')=2 



prec(A, A')recall(^4, A') 
prec(A, A') + recall(A, A') ' 



(C4) 
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FIG. 10: Trade-off between precision and recall. At low mean 
community size n, our detected communities are too small and 
we have high precision but low recall. At larger community 
size, our detected communities are too large and we have low 
precision but high recall. When we properly detect the correct 
communities, both precision and recall are high, and we have 
Fi = 1. In this sample graph, actual communities have a size 
of n = 32 which we can observe from the peak of F\. This 
figure actually plots Fx which will be explained in the next 
sections, but conceptually the results are the same. Tested 
on q = 4 n — 32 SBM graph with pi n — .5 and p ou t = .1. 



The use of Fx is between a set of nodes (the known "com- 
munity" A) and another set (the "detected nodes", A'), 
and for /3 ^ 1 is non-symmetric in A and A' . For f3 — 1, 
Fx is symmetric, with precision and recall swapping val- 
ues when A and A' are swapped. 

I~l 2 72 

The overall F-score in Fig. 9 is Vrf ~ - 64 - Notc 

I I 10 12 

that the F-score does not depend on the total number of 
nodes in the graph. This means our F-score scale does 
not change as we increase the total graph size (at approx- 
imately constant community size), providing theoretical 
advantages for local community detection. 



2. Partition F-score 

In order to compare two partitions, we take one parti- 
tion as the "known" community assignment we want to 
match (a) and another partition as the result from our 
(now global) community detection algorithm. We take 



an average of Fx scores, 



F x p {a,oc') 



1 



V max- m^A')) (C5) 

* — ' A'ea' 



In words, for every community A in the known partition, 
we compute the F-score with all communities A' in a' 
and, and choose the one which maximizes the Fx with A. 
We take the average value of all of these maxima. The 
number of communities in a is represented by |ev|. 

As defined above, Ff is non-symmetric. In the final 
value, each community in a is used at least once as the 
first argument to Fx, but the only communities in a' 
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used as the second argument are those which maximize 
F\ to at least one of the communities in a. There can 
be communities in a' which are left unused and do not 
affect the result. 

It is extremely important to understand the non- 
symmetric nature of Ff . As it is posed above, Ff is 
a useful metric if every known community is matched by 
at least one detected community. It answers the ques- 
tion "Is every community detected at least once?". A 
distinct question is "does every community in a' corre- 
spond to at least one real community?" . To answer this 
second question, we swap the roles of a and a' in F-[ 
and instead compute Ff(a',a). Just like precision and 
recall both have their uses in understanding the com- 
munity detection process, F^{a' ,a) and i 7, 1 p (a,a') both 
tell different and useful properties of our minimization: 
One answers the question "is every community detected 
at least once?", the other answers "does every detected 
community represent a real community?" 

Naively, F[ is an 0(|a||a'|) calculation, because we 
must compare every community in a to every commu- 
nity in a'. There is potential for optimization by first 
generating a list of only overlapping communities. Each 
actual Fi evaluation consists only of the operations of 
set intersection and set cardinality, which can be made 
efficient. 

Analogously, we could define partition precision/recall 

as 



prec p (a, a') = •j— r max (precM, A')) (C6) 
\a\ f— ' A'ea' 

1 1 Aea 

recall p (a,a') = t^-t max (recallM, A')) . (C7) 



\a\ \ — ' A'ea' 
Aea 



A low precision generally indicates that communities are 
being detected too large or too liberally. A low recall 
generally indicates that communities are being detected 
too small or too conservatively. This can provide valuable 
information to monitor the performance of our algorithm. 
Note that Ff is not communative with respect to the 
averaging and the precision/recall, 



prec p recall p 
prec p + recall p 



(C8) 



3. F-score applications 

We apply Ff in two ways: first to, compare the uni- 
formity of a handful of replicas (results from community 
detection applied to the same graph with different ini- 
tial conditions/random seeds). Second, to compare par- 
titions with a known state. For these, we assume that we 
have r replicas a±, a.%, 

First, we can use Ff as a measure of inter-replica uni- 
formity in our multi-resolution algorithm. This takes the 
place of the variance of information, normalized mutual 
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FIG. 11: Comparison between F( R and F° on a q = 20 
n = 20 SBM graph. We see that the measures initially reach 
a maximum at the same point n = 20, allowing us to use 
F[ R to infer that communities have a size of 20 nodes. The 
n = 400 maximum of F( R is the trivial result that when one 
giant community is detected, the result is very uniform. 



information, or TV-measure. For this, we calculate Ff 
with respect to every pair of replicas, 



(C9) 



a^p 



for the partitions a and f3 in the replicas. This measure 
is 1 when all replicas are identical. Note that we sum 
over both the argument orders (a, /3) and (/?, a) due to 
the non-symmetric nature of the Ff measure. 

Next, we can form a version of this measure which de- 
tects similarity to a known structure which we designate 
as a . The f p score with respect to a known configura- 
tion is defined as 



(C10) 



for all replicas a. Due to the asymmetric nature of Ff , 
this measure indicates the extent to which all known 
communities are detected in replicas, but not the extent 
to which all detected communities represent real struc- 
ture. The asymmetric nature of F± here is beneficial, be- 
cause we can handle cases where we partition the entire 
graph into communities, but can also search for detec- 
tion_of_only a few partitions within a larger graph. In 
we show that F[ R is useful for inferring F®, and 
we show that F° behaves similarly to other 



Fig. 
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in Fig. 

partition comparison functions. 



4. Discussion 

The F-score is a tool for comparing partitions sharing 
some similarity with already existing measures, but with 
the possibility for further extension to local community 
detection and overlapping nodes. By combining different 
orders applications of averaging, precisions, recalls, and 
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FIG. 12: Comparison of F® and other partition measures 
on the same q = 4 n = 32 SBM graph. We see that all 
measures contain extrema at the same community size (n), 
validating that all measures convey approximately the same 
information on this graph. We compare mutual information 
1° [381 175] . normalized mutual information In [381 175] . and 
variance of information 1/J |38I l73l 174] . 



F-scores, it provides additional insight to the internal 
workings of CD methods. 
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